AITopics | memory device

Collaborating Authors

memory device

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Neural Modulation for Flash Memory: An Unsupervised Learning Framework for Improved Reliability

Neural Information Processing SystemsFeb-17-2026, 10:57:41 GMT

The continued scaling of flash memory technology into smaller process nodes, combined with the increased information capacity of each flash cell (i.e, storing more bits per cell), has placed NAND flash memory at the forefront of modern storage technology.

artificial intelligence, machine learning, modulator, (19 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel (0.04)

Technology:

Information Technology > Hardware > Memory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.51)

Add feedback

SCRec: A Scalable Computational Storage System with Statistical Sharding and Tensor-train Decomposition for Recommendation Models

Yang, Jinho, Kim, Ji-Hoon, Kim, Joo-Young

arXiv.org Artificial IntelligenceApr-1-2025

NN, MM YYYY 1 SCRec: A Scalable Computational Storage System with Statistical Sharding and Tensor-train Decomposition for Recommendation Models Jinho Y ang, Graduate Student Member, IEEE, Ji-Hoon Kim, Graduate Student Member, IEEE, Joo-Y oung Kim, Senior Member, IEEE, Abstract --Deep Learning Recommendation Models (DLRMs) play a crucial role in delivering personalized content across web applications such as social networking and video streaming. However, with improvements in performance, the parameter size of DLRMs has grown to terabyte (TB) scales, accompanied by memory bandwidth demands exceeding TB/s levels. Furthermore, the workload intensity within the model varies based on the target mechanism, making it difficult to build an optimized recommendation system. In this paper, we propose SCRec, a scalable computational storage recommendation system that can handle TB-scale industrial DLRMs while guaranteeing high bandwidth requirements. SCRec utilizes a software framework that features a mixed-integer programming (MIP)-based cost model, efficiently fetching data based on data access patterns and adaptively configuring memory-centric and compute-centric cores. Additionally, SCRec integrates hardware acceleration cores to enhance DLRM computations, particularly allowing for the high-performance reconstruction of approximated embedding vectors from extremely compressed tensor-train (TT) format. By combining its software framework and hardware accelerators, while eliminating data communication overhead by being implemented on a single server, SCRec achieves substantial improvements in DLRM inference performance. It delivers up to 55.77 speedup compared to a CPU-DRAM system with no loss in accuracy and up to 13.35 energy efficiency gains over a multi-GPU system. I NTRODUCTION R RECOMMENDA TION systems are widely used in social network services and video streaming platforms to provide personalized and preferred content to consumers as described in Fig.1. They are also employed in search engines to offer differentiated search services [1]-[5]. For example, more than 80% of Meta's data center resources are allocated to recommendation system inference, while over 50% are utilized for training these systems [6]. Traditional recommendation systems relied on collaborative filtering techniques, such as content filtering using matrix factorization [7]-[10]. However, with advancements in deep neural networks (DNNs), deep learning recommendation models (DLRMs) that combine embedding tables (EMBs) and This work was supported by Samsung Electronics Co., Ltd.. Manuscript received MM dd, yyyy; revised MM dd, yyyy. These models are widely adopted in data centers, with recent focuses on both software-level and hardware-level optimizations [11]- [17]. This combination has demonstrated superior recommendation performance, making DLRM the industry standard in recommendation systems.

artificial intelligence, machine learning, smartssd, (20 more...)

arXiv.org Artificial Intelligence

2504.0052

Country:

Asia > South Korea > Daejeon > Daejeon (0.04)
North America > United States (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
Asia > South Korea > Gyeonggi-do > Suwon (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Analog Bayesian neural networks are insensitive to the shape of the weight distribution

Patel, Ravi G., Xiao, T. Patrick, Agarwal, Sapan, Bennett, Christopher

arXiv.org Machine LearningJan-9-2025

Recent work has demonstrated that Bayesian neural networks (BNN's) trained with mean field variational inference (MFVI) can be implemented in analog hardware, promising orders of magnitude energy savings compared to the standard digital implementations. However, while Gaussians are typically used as the variational distribution in MFVI, it is difficult to precisely control the shape of the noise distributions produced by sampling analog devices. This paper introduces a method for MFVI training using real device noise as the variational distribution. Furthermore, we demonstrate empirically that the predictive distributions from BNN's with the same weight means and variances converge to the same distribution, regardless of the shape of the variational distribution. This result suggests that analog device designers do not need to consider the shape of the device noise distribution when hardware-implementing BNNs performing MFVI.

artificial intelligence, machine learning, variational distribution, (18 more...)

arXiv.org Machine Learning

2501.05564

Country:

North America > United States > Rocky Mountains (0.04)
North America > Canada > Rocky Mountains (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.84)

Industry:

Government > Regional Government > North America Government > United States Government (0.94)
Energy (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

PIFS-Rec: Process-In-Fabric-Switch for Large-Scale Recommendation System Inferences

Huo, Pingyi, Devulapally, Anusha, Maruf, Hasan Al, Park, Minseo, Nair, Krishnakumar, Arunachalam, Meena, Akbulut, Gulsum Gudukbay, Kandemir, Mahmut Taylan, Narayanan, Vijaykrishnan

arXiv.org Artificial IntelligenceSep-25-2024

Deep Learning Recommendation Models (DLRMs) have become increasingly popular and prevalent in today's datacenters, consuming most of the AI inference cycles. The performance of DLRMs is heavily influenced by available bandwidth due to their large vector sizes in embedding tables and concurrent accesses. To achieve substantial improvements over existing solutions, novel approaches towards DLRM optimization are needed, especially, in the context of emerging interconnect technologies like CXL. This study delves into exploring CXL-enabled systems, implementing a process-in-fabric-switch (PIFS) solution to accelerate DLRMs while optimizing their memory and bandwidth scalability. We present an in-depth characterization of industry-scale DLRM workloads running on CXL-ready systems, identifying the predominant bottlenecks in existing CXL systems. We, therefore, propose PIFS-Rec, a PIFS-based scheme that implements near-data processing through downstream ports of the fabric switch. PIFS-Rec achieves a latency that is 3.89x lower than Pond, an industry-standard CXL-based system, and also outperforms BEACON, a state-of-the-art scheme, by 2.03x.

cxl, fabric switch, pif-rec, (14 more...)

arXiv.org Artificial Intelligence

2409.16633

Country:

North America > United States > Pennsylvania > Centre County > University Park (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
Europe (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models

Kim, Taehyun, Choi, Kwanseok, Cho, Youngmock, Cho, Jaehoon, Lee, Hyuk-Jae, Sim, Jaewoong

arXiv.org Artificial IntelligenceMay-29-2024

Mixture-of-Experts (MoE) large language models (LLM) have memory requirements that often exceed the GPU memory capacity, requiring costly parameter movement from secondary memories to the GPU for expert computation. In this work, we present Mixture of Near-Data Experts (MoNDE), a near-data computing solution that efficiently enables MoE LLM inference. MoNDE reduces the volume of MoE parameter movement by transferring only the $\textit{hot}$ experts to the GPU, while computing the remaining $\textit{cold}$ experts inside the host memory device. By replacing the transfers of massive expert parameters with the ones of small activations, MoNDE enables far more communication-efficient MoE inference, thereby resulting in substantial speedups over the existing parameter offloading frameworks for both encoder and decoder operations.

computation, expert computation, expert parameter, (17 more...)

arXiv.org Artificial Intelligence

2405.18832

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Efficient and Economic Large Language Model Inference with Attention Offloading

Chen, Shaoyuan, Lin, Yutong, Zhang, Mingxing, Wu, Yongwei

arXiv.org Artificial IntelligenceMay-2-2024

Transformer-based large language models (LLMs) exhibit impressive performance in generative tasks but introduce significant challenges in real-world serving due to inefficient use of the expensive, computation-optimized accelerators. This mismatch arises from the autoregressive nature of LLMs, where the generation phase comprises operators with varying resource demands. Specifically, the attention operator is memory-intensive, exhibiting a memory access pattern that clashes with the strengths of modern accelerators, especially as context length increases. To enhance the efficiency and cost-effectiveness of LLM serving, we introduce the concept of attention offloading. This approach leverages a collection of cheap, memory-optimized devices for the attention operator while still utilizing high-end accelerators for other parts of the model. This heterogeneous setup ensures that each component is tailored to its specific workload, maximizing overall performance and cost efficiency. Our comprehensive analysis and experiments confirm the viability of splitting the attention computation over multiple devices. Also, the communication bandwidth required between heterogeneous devices proves to be manageable with prevalent networking technologies. To further validate our theory, we develop Lamina, an LLM inference system that incorporates attention offloading. Experimental results indicate that Lamina can provide 1.48x-12.1x higher estimated throughput per dollar than homogeneous solutions.

accelerator, attention operator, inference, (15 more...)

arXiv.org Artificial Intelligence

2405.01814

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Combination of Metal Oxides as Oxide Layers for RRAM and Artificial Intelligence

Hanyu, Sun

arXiv.org Artificial IntelligenceApr-29-2023

Resistive random-access memory (RRAM) is a promising candidate for next-generation memory devices due to its high speed, low power consumption, and excellent scalability. Metal oxides are commonly used as the oxide layer in RRAM devices due to their high dielectric constant and stability. However, to further improve the performance of RRAM devices, recent research has focused on integrating artificial intelligence (AI). AI can be used to optimize the performance of RRAM devices, while RRAM can also power AI as a hardware accelerator and in neuromorphic computing. This review paper provides an overview of the combination of metal oxides-based RRAM and AI, highlighting recent advances in these two directions. We discuss the use of AI to improve the performance of RRAM devices and the use of RRAM to power AI. Additionally, we address key challenges in the field and provide insights into future research directions

artificial intelligence, machine learning, rram device, (20 more...)

arXiv.org Artificial Intelligence

2305.00166

Genre:

Research Report (1.00)
Overview (0.89)

Industry:

Semiconductors & Electronics (0.49)
Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Neuromorphic memory device simulates neurons and synapses: Simultaneous emulation of neuronal and synaptic properties promotes the development of brain-like artificial intelligence

#artificialintelligenceMay-22-2022, 16:02:05 GMT

Neuromorphic computing aims to realize artificial intelligence (AI) by mimicking the mechanisms of neurons and synapses that make up the human brain. Inspired by the cognitive functions of the human brain that current computers cannot provide, neuromorphic devices have been widely investigated. However, current Complementary Metal-Oxide Semiconductor (CMOS)-based neuromorphic circuits simply connect artificial neurons and synapses without synergistic interactions, and the concomitant implementation of neurons and synapses still remains a challenge. To address these issues, a research team led by Professor Keon Jae Lee from the Department of Materials Science and Engineering implemented the biological working mechanisms of humans by introducing the neuron-synapse interactions in a single memory cell, rather than the conventional approach of electrically connecting artificial neuronal and synaptic devices. Similar to commercial graphics cards, the artificial synaptic devices previously studied often used to accelerate parallel computations, which shows clear differences from the operational mechanisms of the human brain.

artificial intelligence, memory device, neuron and synapse, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.39)

Add feedback

Memory Association Networks

Kim, Seokjun, Jang, Jaeeun, Jang, Yeonju, Choi, Seongyune, Kim, Hyeoncheol

arXiv.org Artificial IntelligenceDec-27-2021

Various networks have been designed in the deep learning field to date. Typically, images, sounds, text, hierarchical, and relational data are learned through the networks, and inductive learning is performed. But these networks are limited to specific datasets or specific tasks. Therefore, we designed artificial association networks that can simultaneously learn various datasets in one network like humans. And in the second study, deductive association networks were proposed to perform deductive reasoning.

association network, information, vector, (13 more...)

arXiv.org Artificial Intelligence

2111.02353

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Filters

Collaborating Authors

memory device

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Neural Modulation for Flash Memory: An Unsupervised Learning Framework for Improved Reliability

da7e0d7210b99ebc91c4a5f911962d6c-Paper-Conference.pdf

SCRec: A Scalable Computational Storage System with Statistical Sharding and Tensor-train Decomposition for Recommendation Models

Analog Bayesian neural networks are insensitive to the shape of the weight distribution

PIFS-Rec: Process-In-Fabric-Switch for Large-Scale Recommendation System Inferences

MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models

Efficient and Economic Large Language Model Inference with Attention Offloading

The Combination of Metal Oxides as Oxide Layers for RRAM and Artificial Intelligence

Neuromorphic memory device simulates neurons and synapses: Simultaneous emulation of neuronal and synaptic properties promotes the development of brain-like artificial intelligence

Memory Association Networks